University of Manitoba: description of the PIE system used for MUC-6

نویسنده

  • Dekang Lin
چکیده

The PIE (Principar-driven Information Extraction) system takes a different approach to the problem o f information extraction from the NUBA system that was used in MUC-5 . The NUBA system did not have a parser and relies on an abductive reasoner to construct the semantic relationships between domain specifi c concepts mentioned in a sentence . The PIE system, on the other hand, relies heavily on a principle-base d broad-coverage parser, called PRINCIPAR [2, 6, 8], that we have developed over the past three years . Most of the information extracted are directly "read-off" the parser outputs by a subtree pattern-matcher, bypassin g the usual step of constructing semantic representations . In spite of the radical difference between the high-level approaches of our MUC-5 and MUC-6 systems , over 85% of the code of the MUC-5 system was reused . This is largely due to the unified abductive view o f different tasks in natural language understanding [7] . PRINCIPAR and the abductive semantic interprete r share the same message passing algorithm for abduction . They differ only in the contents of the messages and constraints on message propagation. The architecture of the PIE system is shown in Figure 1 . The processing is sequential . A text is firs t broken up into sentences . The lexical analyzer turns a stream of tokens into a lattice of lexical items . The lexical items may then be combined or deleted by lexical rules, which are responsible for recognizing named entities . PRINCIPAR takes the lattice of lexical items and output a dependency tree between the words in a sentence . PRINCIPAR attempts to construct a parse for the full sentence . However, when it fails to do that , it retrieves parse fragments that cover the complete sentence . Information is extracted from the dependenc y trees by a subtree pattern matcher. The format of NE and CO outputs do not meet the standard of the scoring software because of arbitrary insertion and deletion of white spaces . A separate program is used t o resolved the differences . The PIE system is implemented in about 38k lines of C++, about 33k to 34k lines were written befor e MUC-6. It contains an interpreter for LISP-like expressions so that all the knowledge structures, such a s finite automata for finding sentence boundaries, the lexicon, the lexical rules, the grammar network, an d extraction rules, are written in LISP-like expressions .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

American University in Cairo: Description of the American University in Cairo's System Used for MUC-7

Portions of the American University in Cairo's MUC-7 system, MUC7-Plink, have participated in every Message Understanding Competition since MUC-4. The Plink parser was developed at the University of Michigan where it formed the core of the systems entered in MUC-4 [2] and MUC-5 [1]. Recently, the Plink parser was added to GATE [6] to facilitate interaction between language processing modules. M...

متن کامل

University of Durham: Description of the LOLITA system as Used in MUC-7

LOLITA has been designed in such a way that the code implementing the MUC tasks is only a small part of the whole system. A core system provides complex facilities with the MUC system being built so that it utilises these facilities. Hence, after some background to the LOLITA project, the ‘core’ of LOLITA is described. This system description is substantially similar to that given for MUC-6 [1]...

متن کامل

University of Sheffield: Description of the LaSIE-II System as Used for MUC-7

The University of She eld NLP group took part in MUC-7 using the LaSIE-II system, an evolution of the LaSIE (Large Scale Information Extraction) system rst created for participation in MUC-6 [9] and part of a larger research e ort into information extraction underway in our group. LaSIE-II was used to carry out all ve of the MUC-7 tasks and was, in fact, the only system to take part in all of t...

متن کامل

Description of the UMass system as used for MUC-6

Information extraction research at the University of Massachusetts is based on portable, trainable language processing components. Some components are more effective than others, some have been under development longer than others, but in all cases, we are working to eliminate manual knowledge engineering. Although UMass has participated in previous MUC evaluations, all of our information extra...

متن کامل

University of Maryland/ConQuest: description of the ICTOAN system as used for MUC-4

The ICTOAN system is a natural language processing system developed jointly by ConQuest, Inc . and the University of Maryland Baltimore County. The system was written from scratch during the first five months of 1992 using an estimated eight person-months of labor . The template generation routines were reused from our MUC-3 system [1], providing leverage of perhaps one person-month . Adaptatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995